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Abstract 

In a recent study the initial rise of the mutual information between the firing rates of N 
neurons and a set of p discrete stimuli has been analytically evaluated, under the assumption that 
neurons fire independently of one another to each stimulus and that each conditional distribution 
of firing rates is gaussian. Yet real stimuli or behavioural correlates are high-dimensional, with 
both discrete and continuously varying features. Moreover, the gaussian approximation implies 
negative firing rates, which is biologically implausible. Here, we generalize the analysis to the 
case where the stimulus or behavioural correlate has both a discrete and a continuous dimension, 
like orientation and shape could be in a visual stimulus, or type and direction in a motor action. 
The functional relationship between the firing patterns and the continuous correlate is expressed 
through the tuning curve of the neuron, using two different parameters to modulate its width 
and its flatness. In the case of large noise we evaluate the mutual information up to the quadratic 
approximation as a function of population size. We also show that in the limit of large N and 
assuming that neurons can discriminate between continuous values with a resolution Ad, the 
mutual information grows to infinity like ^of;(l/A'i9) when A?? goes to zero. Then we consider 
a more realistic distribution of firing rates, truncated at zero, and we prove that the resulting 
correction, with respect to the gaussian firing rates, can be expressed simply as a renormalization 
of the noise parameter. Finally, we demonstrate the effect of averaging the distribution across 
the discrete dimension, evaluating the mutual information only with respect to the continuously 
varying correlate. 

Introduction 

The strategy used by populations of neurons to code for external stimuli or behavioural correlates 
is a major issue which has been recently investigated both through data analysis and theoretical 
modelling. The mutual information between external correlates and the spiking activity of the 
population is one way to assess such coding quantitatively |||]. Several analyses have focused on 
the coding of a discrete set of stimuli (||2[, see for a review), which is the paradigm used in 
many experiments^, ^, ^, 0- In this situation the mutual information is bounded by the entropy 
of the stimulus set. Some theoretical studies have also considered the coding of stimuli varying in 
a continuous domain [S, 0], which is interesting with respect to basic properties like orientation in 
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visual stimuli, frequency in auditory stimuli, velocity and position in motor actions. In particular in 
[^] the authors have studied the asymptotic (large population) behaviour of the mutual information, 
with respect to a stimulus with a continuously varying dimension. 

Yet no study has been proposed so far considering a mixture of both continuous and discrete 
features, which is obviously closer to real world stimuli or behavioural correlates. Moreover the 
initial rise of the mutual information for small but increasing population size is more relevant for a 
comparison with estimates from real data, at least as far as the possibility of having simultaneous 
recordings from very large populations of neurons is restricted to very few cases. 

We have recently analyzed data recorded in the motor areas of behaving monkeys, in the 
laboratory of Eilon Vaadia. The monkeys moved a manipulandum in several possible directions 
(approximating a continuous correlate) and with different combinations of arms (4 types of move- 
ment, i.e. a discrete correlate). In trying to characterize the neural coding of these movements, 
we were particularly interested in whether, as it is reasonable to expect, different motor areas 
differ, at least quantitatively, in their coding properties. The results, obtained from records of 
activity in areas Ml and SMA, will be reported elsewhere [10|; but they suggest the importance 
of developing theoretical models of how populations of neurons might code simultaneously discrete 
and continuous correlates. For example, one clear conclusion has been that type and direction are 
not independent dimensions of the movement, in the sense for example that the information about 
direction, extracted from the activity recorded with all movement types, is much lower (roughly 
half) of the average information about direction, obtained with a single movement type. 

Can we embody similar properties in a model of the scheme used by neurons to code movements? 
What would then be the dependence of the mutual information on the number of the possible 
types of movement? How would it depend on the resolution with which the continuous correlate is 
sampled? How on the level of noise affecting the firing patterns? 

In a recent work |^] some of these questions have been investigated for a set of p discrete stimuli, 
under the assumption that neurons fire independently of one another to each stimulus and that 
the distribution of the firing rates is gaussian. The linear and the quadratic approximations to the 
mutual information as a function of population size were studied analytically, in the limit of large 
noise, as well as the approach to the ceiling in the case of small noise. We generalize this study 
considering both a discrete and a continuous dimension in the stimulus, referring specifically to 
motor actions characterized by a direction and a "type". Nonetheless, our model is equally appli- 
cable to other complex correlates. We introduce a more realistic conditional firing rate distribution, 
than the simple gaussian one, and find a simple resulting correction to the gaussian model: the 
analytical expression of the mutual information remains the same, except for a renormalization of 
the expansion parameter. 

We then evaluate the information loss when the original activity distribution is averaged across 
the discrete correlate, as is sometimes the case in the analysis of real data, and the mutual infor- 
mation is evaluated solely with respect the continuously varying feature. Averaging out dimensions 
in the stimulus corresponds to losing accuracy in its description, and hence the information loss. 

Our theoretical analysis allows a direct comparison with real curves; we present one comparison 
and discuss possible causes for the discrepancy between data and model. In particular, correlations 
between neurons, that are not included in the model, might play a relevant role, enhancing or 
decreasing redundancy in population coding |11]. This issue will be the object of future work. 
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Figure 1: Directional tuning for a cell recorded in the right supplementary motor area of a mon- 
key performing 4 different types of arm movement. UniLt=unimanual left; UniRt=unimanual 
right; BiSym=bimanual symmetric; BiOpp=bimanual opposite. Notice that the type of movement 
strongly modulates the amplitude of directional tuning, but not so much its preferred direction. 

The gaussian model 

First, we consider a coding scheme where the distribution of the firing rates conditional to the 
movements is gaussian, similarly to the case examined in [|2|. This assumption implies that negative 
rates have a non zero probability to occur, but it allows an easier analytical treatment. We will 
examine a more realistic scheme later on. 
Consider the following distribution: 
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r]i is the firing rate of neuron z; -d and s parameterize respectively the direction and the type of 
movement; fji{'d,s) is the average firing rate of the neuron with the movement parameterized by 
■d,s. 

In general, the directional tuning of real cells in motor cortices is modulated by the type of 
movement performed. We show an example of this modulation, with the typical shape of tuning 
curves, in fig.|| (data kindly provided by Eilon Vaadia). The modulation of the preferred direction 
looks weaker than the overall amplitude modulation. 

For our model we have considered the following function: 
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Figure 2: Cosinusoidal tuning curves as in eqs.(|2|), (^). (a) m = 1; modulation for different values 
of £s- (b) £s = 1; modulation for different values of m. 



el is a quenched random variable distributed between and 1. The meaning of eq. (|^) , (^) , is that 
the firing rate of neuron i given the movement parameterized by {'d,s) follows a gaussian distribution 
centered around a tuning curve fii{-&) whose flatness is modulated through the parameters . If 
is zero for some particular s the firing of the cell does not depend, for that movement type, on the 
direction of the movement. On the other hand if e* assumes a fixed value for all s the directional 
tuning does not modulate with the type of movement. Tuning curves with a cosinusoidal shape 
have already been considered to model the directional selectivity of sensory neurons Q. Fig.^ shows 
the amplitude of the tuning curve. 

We neglect the modulation of the preferred direction with the type, as it would burden the 
analytical calculations; it will be the object of future analyses. 



Evaluation of the mutual information for the gaussian model 

We are interested in the mutual information |12] between the neuronal firing rates and the move- 
ments: 

I{H},^^s) = (^j2 Jd^J lldv^P{^,s)P{{vi}W,s)log, ^p^faij'^ ) ; (4) 

where the distribution P{rji\'d, s) is given in eq.(|l|) and {..)^ is a short notation for the average 
across the quenched variables {es})!^?}- f^^t we are not interested in a particular realization of 
the tuning, but in the average across all its possible realizations. 

Eq.(Q) can be written as: 

s) = i7({7?,}) - {H{{rj,}\^, s))^^^ ; (5) 
{H{{rj,}\^,s)),,^^ = (^2 ^d^^]ld7^,P{^,s)P{{v^m,s)log,P{{r^i}\^,s)) ; (6) 
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The calculation of the equivocation {H{{r]i}\'d, s))^ ^ is straightforward and the result is additive 
in the population size: 
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The linearity in is standard whenever the conditional distribution of firing rates can be 
factorized across neurons, as in eq.(||). 

The evaluation of the rate entropy H{{T]i}) can be carried out introducing n replicas [13, 14| for 
both the discrete and the continuous dimensions i9 and s, which allows to get rid of the logarithm 
in eq.(0): 
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Integrating over {r]i\ and rearranging terms one obtains: 
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Large cr limit 

An exact analytical evaluation of eq.(^) is not possible without resorting to some approximation. In 
line with the analysis performed in [|| we assume now that the quenched randomness is uncorrelated 
and identically distributed across neurons: 



Q{{e\})={[{Q{es 



N 



N 



(2^)^' 



We assume also that ??/ = r]^ Mi. Then one can write: 
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We consider now the limit of large noise a; in this case, since R ~ l/o"^ we can expand exp{—R); 
keeping only terms of order {N/a^)\ with k <2 and ^ = 1, 2 we obtain: 
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To first order in N ja^ we obtain: 
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This result is valid for a generic directional tuning curve f\{'d). We consider now our specific 
choice, eq.(^, examining the simplest case m = 1 first. After averaging across direction selectivities 
{'i?''} and integrating over continuous replicas we obtain 
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where we have defined a = 'q^ /rf . To perform the average across the quenched variables {Ss} we 
assume that they are equally distributed across p movement types, namely: 

Q{{es}) = \{Q{es) = [Q{e)f; 

s 

In this case the sum over indexes k and I generates n(n + 1) identical terms. The summation 
on discrete replicas yields a factor multiplying the term ((^s^.)^);, and a factor p"'(p — 1) 

multiplying the term {{ssk — es;)^)^, since this last term is non zero only when Sk 7^ s/. 

Taking the limit n ^ yields 
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X2 = J deQ{e)e^; (16) 
Prom eqs.(0) and (^) the final expression for tlie mutual information can be written 
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In the more general case of a power 2m of the cosine, in eq.(^), it is easy to show that the final 
result can be expressed as 
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The calculation of the coefficient of the second order term, which multiplies N"^ /a^ in eq.(^), 
is slightly more complex and implies integration of terms with 4-replica interaction. The detailed 
analytical evaluation is given in the appendix. The final result up to the quadratic approximation 
reads: 
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where the expressions of Ai,A2,j4i,A2 are given respectively in eq. (||),(|l6|),(|lD,(|2|) In the limit 
of large p we have 
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Figure 3: Information rise, from eq.(|22|), for different values of the expansion parameter g = 
[rf /2a)'^; m = 1; p = A; the distribution Q{e) in eqs.(|l5|), (16) is just equal to 1/3 for each of the 3 
allowed e values of 0,1/2 and 1. 
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Figure 4: Mutual information in linear and quadratic approximation as in eq.(|22|), for a sample of 
10 cells; = 0.7; the distribution g{e) in eqs.(15), (16) is just equal to 1/3 for each of the three 
allowed a e values of 0,1/2 and 1. (a)Dependence on the number of movement types p. Dotted 
lines are for the asymptotes, eg. (p2[) ; m=l. (b)Dependence on the power m of the cosine in eq.(^; 
p=4. 
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Figjs] shows the hnear and the quadratic approximations of eq.(|2^) for different values of the 
expansion parameter = {rf/2a)'^. It is easy to see that, for very small values of , linear 
and quadratic approximations roughly coincide, while when > 0.8 the quadratic approximation 
begins to fail and one should add higher orders in perturbation theory. 

FigJ^(a) shows the dependence of the mutual information on the number of types p. The 
dependence on p is weak for the linear approximation and somewhat stronger for the quadratic 
one. In both cases an increase in the number of discrete correlates p produces an increase in the 
mutual information. The distance between the linear and the quadratic approximation remains 
asymptotically finite (for p — > 00), contrary to what happens in the case of discrete stimuli alone 

i- 

FigJ^(b) shows the dependence of the mutual information on the width of the directional tuning 
(see fig.^,(b)). Since we are considering the case when the noise a is large, a very narrow tuning 
in 1? corresponds to a larger overlap in the conditional probabilities p(r/|??, s), for most angles "Q. A 
consequence is that, especially in the linear approximation, the mutual information is a (slowly) 
decreasing function of m. 

The limit of large N 

We consider now the case when the number of neurons is large. Since we deal with an infinite 
number of stimuli the mutual information is unbounded. Thus we expect that when the number 
of neurons becomes large and the noise is finite the mutual information tends asymptotically to 
infinity. 

In order to study this limit we discretize the j"!?} space into a finite set of M = 2it/Ai!) angles 
'di-.-dM, and then we take the limit At!) — > 0. 

The entropy of the responses H({rji}) can be written 
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and we discretize the average across the directional selectivities as well 
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This situation corresponds to the case when each neuron can discriminate across different angles 
'di-.'dM with a resolution A?9. 

The calculation can be carried out introducing replicas as in the previous case. One gets 
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and we have assumed symmetry across neurons in the quenched randomness and in the parameters 
characterizing the conditional distribution P{rji\s,'dk)- 

Now we take the limit N ^ oo. As it is evident from eq.(p7|), exp{—R) < 1 and exp{—R) = 1 
when si = Sm and km = ki for each pair of indexes {m,l). Thus when N ^ oo the only terms which 
survive in the sum on replicas are the ones with si = S2-- = Sn+i and ki = k2-- = kn+i- Since we 
have p stimuli s and M stimuli "i^^ the total number of terms is Mp. Substituting this value in the 
sums over replicas in eq.(p6|) and putting exp{—R) = 1 one obtains an expression for the entropy 
of the responses H[{r]i}), which summed to the equivocation as in eq.(|8|) gives the final result for 
the mutual information: 



/({r/J, s) = log2(p) + log2(Af); (28) 

Now we remember that M = lir/AiD. Taking the limit to continuous angles, Ai? — > 0, it is easy 
to see that asymptotically the mutual information tends logarithmically to infinity. 



Beyond the gaussian assumption: the tg model 

So far we have considered the case where the rate distribution for each neuron is normal. This 
assumption implies that negative rates have a non zero probability to occur; the more the average 
rate is small and close to zero, the more this probability becomes large. The bias introduced by the 
inclusion of negative rates in the space of possible states might be even more serious since we have 
considered the limit of large noise, where the tail of the distribution in the domain r] < acquires 
a significant weight. 

Cutting the distribution at zero is not enough to assign the proper weight to under-threshold 
activity: each time the summation of the inputs coming from other units is lower than threshold 
the neuron remains silent, and this occurs with a well defined probability. 

A natural choice for the rate distribution P(rii\'d,s) is a thresholded gaussian plus a 5 peak in 
zero {tg model): 

P{r]i\^, s) = ^7=f exp - [(/?, - U^, s)f /2(j2] G(r/i) + 2(1 - erf(7/,(i?, s)/a)5(r?,)e(-r/,) (29) 

where G(x) is the Heaviside step function, r?i('!9, s) is the same as defined in eq.(§) and erf (x) is the 
error function: 
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erf(x) = ^= / dte"*'/2_ (39) 



The factor multiplying the b function ensures a correct normalization and it assigns the proper 
weight to the peak in zero, which is larger the more the average rate f]i{i)^s) is close to zero. A 
similar distribution has already been considered in networks of threshold linear neurons 16 1. 



The analytical evaluation of the mutual information is obviously more difficult than in the case 
of the simple gaussian, because of the presence of the error function, which cannot be integrated 
exactly. Nonetheless in the limit of large a it is possible to evaluate both the linear and the 
quadratic approximation in N and thus quantify the impact of the correction with respect to the 
gaussian case, eq.(p^). 

The limit of large a for the tg model: the equivocation 

We remind the expression of the equivocation, eq.(^): 

(/f({7?J|^, S))^^^ = ( E /d^/ n ^^*^(^' S) log2 PiHm S) ) . (31) 

\s=l i I £^^0 

Assuming independence among neurons in the conditional probability P[{'qi}\'d, s)^ eq.(|3T|) can 
be written 

{H{{^,}\^,s)),,^, = ^j2 jMP{^,s)l^j d7?P(77|^,.)lnP(r/|T?,s)^^^^. (32) 



In the specific case of the distribution ( [29| ) it is easy to show that 
dr]P{r]\^,s)\nP((q\'d,s) 
_^ (5<Me-«-.lV-)^^^ (, , M..,^,) „ 

+ ( r dr]6{r]) (1 - erf (^7(1?, s) /a)) [In (5(r?) + In 2 + In (1 - erf (r/(??, s) /a))]) . (33) 

\"'0 / e,-^o 

To proceed with the calculation we have to be careful with the integration of the delta function. 
In fact it is easy to show that the integration of the product 5(x)ln5(x) yields a logarithmic 
divergence. Since the mutual information must remain finite with a finite number of neurons A'^, 
we expect this divergence to cancel exactly with an analogous term in the rate entropy and, in fact, 
in the next section, we will show that this is the case. For the moment we use the equality 



e/2 

Assuming as usual that the quenched disorder is identically distributed across neurons and 
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dx6{x)F{x) = lim / dx-F{x). (34) 
-00 <:^oj_e/2 e 

the quenched disorder i 
stimuli and that ?7('!9, s) is like in eq.(0) we can write: 
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The average across quenched disorder cannot be performed if we do not resort to some approxi- 
mation. Since we have already focused on the hmit of large a it is natural to consider an expansion 
of the error function in eq. ( |30|) for a small value of its argument: 
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Approximating all the error functions in eq.(E^) we obtain: 
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where in line with the approximation used in the case of the simple gaussian distribution we have 
omitted terms of order N/a^ with k > 2. 

Evaluation of the mutual information 

We reconsider eq.(|7|). Using replicas and assuming that the quenched randomness is identically 
distributed across neurons we obtain 
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where P{ri\i!),s) is given in eq.(2£). Integrating over drj yields 
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where we have used eq.(p4D to integrate the 6 function in eq. (p9D , and the expression of R is like 
in eq.(|ll]). Using the approximation (^) for the error function and considering the expansion for 
small n 
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we obtain 
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It is simple to verify that C remains finite when n goes to zero. Now we expand in powers of 
N up to the second order: 
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where we have omitted terms that are o{n) when n — > 0. This quantity has to be summed over 
continuous and discrete rephcas, after having exphcitly performed the average across quenched 
disorder in eqs.( ^2|) a nd (|4^). It is easy to show that C cancels exactly with analogous terms in the 
equivocation, eg. (p7|) . The evaluation of the linear and quadratic term in the mutual information 
can be performed with a similar technique to the one used in the case of the gaussian distribution 
and it involves averaging terms with 2— and 4— replica interactions. 

The final expression for the mutual information in the linear and quadratic approximation reads 
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Comparing eq.(^) and (^) it is evident that modifying the gaussian model into the more 
realistic tg model has no effect on the analytical expression of the mutual information, except for 
a renormalization of the expansion parameter = {rp /2a)'^: 
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Figjs] shows the effect of the renormalization for different values of g^. The mutual information 
is lower in the tg model than in the gaussian approximation, as expected. 

We have explored whether the two models can fit the information rise estimated from real data. 
Since the analytical expression of the mutual information is the same in both cases the fit does not 
change between the two models. 
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Figure 5: Information rise as in eqs.(]2^), ( |4^ ) for different values of the expansion parameter 
(p' = {rp /2a)^] m = 1; p = the distribution Q{e) in egs-dlSj), (IT^) is just equal to 1/3 for each of 
the 3 allowed e values of 0,1/2 and 1. 
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Figure 6: Comparison between the theoretical curves, eq.(^), ( ^6| ) and the information estimated 
from a sample of cells recorded in the right primary motor cortex []lO|| ; m=l;p=2; the distribution 
g{e) in eqs.(O), ( p^ ) is just equal to 1/3 for each of the three allowed e of 0,1/2,1; the values of 
^2 = (7^0/2(7) used for the fit are 0.64 for eq.(|2|) and 0.78 for eq.® 
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Fig]6| shows the comparison between the information estimated from a sample of cells recorded 
in the right primary motor cortex |10| and the prediction given by either of the two theoretical 
distributions. In the limit of large a the gaussian model fails to provide a good fit, but we can 
conclude that this failure is not due to the inclusion of negative rates in the distribution. 



Information loss in averaging activity distributions 

Fig|l| suggests that the directional tuning of real cells is modulated, albeit moderately, by the type 
of movement. In fact the analysis of real data has proved that the coding of the direction is not 
unique, but it is specific to the complex correlate which is being considered [^0| (here, which arm 
moves) . 

More in general, distinct features characterizing a complex stimulus are not expected to be coded 
independently of one another. This raises the question of how central representations of external 
correlates are constructed and which are the basic featural components of these representations. 
Of course the categorization of natural stimuli is arbitrary and the more accurate a description is 
provided, the higher the dimensionality of the stimulus set. Since an infinite number of different 
descriptors could be chosen to characterize a stimulus, any (finite) categorization has the effect 
of emphasizing some relevant features and averaging out other irrelevant features. An obvious 
consequence is that we end up, even involuntarily, evaluating how some features are coded on 
average, with respect to the dimension we have chosen, explicitly or implicitly, to neglect. 

Thus, with correlates which have one continuous and one discrete dimension, one might wonder 
which are the relationships among the information carried about the total number of continu- 
ous+discrete dimensions, the information carried about the continuous dimension, disregarding 
the discrete dimension and, finally, the information carried about the continuous dimension, if a 
single value of the discrete dimension had been fixed when recording neural activity. In other 
words, suppose that we investigate how the direction is coded on average across different types of 
movement. This corresponds to averaging the full distribution P{{r]i}\'d,s) on s: 

P({7?,}|^?,s) ^^P(s)P({7?,}|^,5) = P,({??ai^). (47) 

s 

The resulting expression of the mutual information is: 

ms) = (^Jd^J lldv^p{m{{v^}mog2 ^^^) ■ (48) 

The analytical evaluation is very similar to the cases already discussed. 

As usual, the mutual information can be expressed as the difference between the entropy of the 
responses H{{rii}) and the equivocation {H{{rii}\'&))^, where 

(^({77^1^?))^ = ^|dT9|n^'?*^W^^({^*}l^)log2^'.({^ai^)^ ; (49) 
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We focus on the more realistic tg model, eq.(|2^); the entropy of the responses is obviously 
independent on the chosen categorization of the stimuli; in the limit of large a the procedure is 
precisely the one followed in the previous section. 

The difference with respect to the cases already discussed is in the evaluation of the equivoca- 
tion: since Ps{{'ni}W) is obtained averaging the distribution ( p^ ) across s, we need to introduce 
discrete replicas si..Sn, as when evaluating the entropy of the responses. Then the calculation is 
straightforward and the basic steps are given in the previous section and in the appendix. 

Since in one case (for the entropy of the responses) we sum both over discrete and over continuous 
replicas and in the other case (for the equivocation) we sum only on discrete replicas, it is clear 
that all terms that do not involve 2 or more replica interacting cancel out. 

More in detail, if the evaluation of the entropy of the responses requires the analytical calculation 
of averages like {'ni^k)vi^i))^o (see the appendix), these averages disappear in the evaluation of the 
equivocation, since replica indexes are only for the discrete variable s. 

The final result for the mutual information up to the quadratic approximation reads: 



N\rff n 1 



|^^8 {a - Ai)^ - (^A2 - 2aAi 



+ 



^ (A, - A,)^ + 
P 



P 



4 m— 1 

r) E 

v=0 





(51) 



FigJ^(a) compares the averaged information, /({r/j}, with the full information /({r/j}, -i? (8) 

s), where in the case of I {{iji} , i!) f^i s) we have put p = 1. The full information calculated withp = 1 
in fact gives the curve one would obtain, on average, by considering only one movement type (or 
value of the discrete correlate) at a time. As one could expect, averaging the distribution across 
the discrete correlates results in an information loss. Moreover, the the full information with p = 4 
movement types is obviously above the specific one, obtained by setting p = 1 (compare fig.^ and 
0(a)). 

FigJ^(b) shows the dependence of the full and averaged information on the number p of discrete 
correlates. Contrary to the /u// information, the averaged information decreases monotonically with 
p, both in the linear and in quadratic approximation. 

As one would expect, averaging the distribution across a large number p of correlates is equiv- 
alent to a regularization of the activity distribution, which results in a lower mutual information. 



Discussion 

We have studied a model of the coding of discrete-|-continuous stimuli by a population of N neurons, 
referring to the specific case of movements categorized according to a direction and a type. We have 
shown that, asymptotically in the limit of large populations of neurons, the mutual information 
tends to infinity logarithmically with the resolution in the continuous dimension. This result aside, 
we have focused on the initial rise of the mutual information with the population size, which may 
offer a more direct comparison with the analysis of real data. 
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(a) #<:ells (b) P 

Figure 7: Mutual information in the linear and quadratic approximations, as in eqs.(^), (|5T|), for 
a sample of 10 cells; = 0.7; m = 1; the distribution Q{e) in eqs.(|l5|), (|l6|) is just equal to 1/3 for 
each of the 3 allowed e values of 0,1/2 and 1. (a) Pull curve as a function of the population size; 
p = \ for the full information /({r/j},?? ® s) and p = A for the averaged information /({r^j}, (t?)^)- 
(b) Dependence on the number of movement types p. Dotted lines are for the asymptotes, p ^ co 



In the limit of large noise we have therefore derived an analytical formula for such initial rise 
of the information, up to the quadratic approximation. We have examined the dependence on the 
number of discrete correlates and on the width of the directional tuning. A comparison with the 
information estimated from real data, in which the linear term is used as a fit coefficient, has shown 
that the quadratic approximation fails to capture the deviation from linearity. 

We have then considered a more realistic model for the conditional firing distribution, a thresh- 
olded gaussian with a 5-peak. We have shown that this more realistic distribution simply renormal- 
izes the expansion parameter applicable to the gaussian distribution. Therefore, the discrepancy 
in the fit to real data does not originate in the firing rate distribution. There are several possible 
reasons for this discrepancy: 

• the value of the expansion parameter g'^ = {rf /cr)'^ corresponding to the best fit in fig.® 
is quite high {g"^ = 0.78). This value is in the range where the quadratic approximation is 
expected to fail on its own (see fig.(P),@). Adding higher orders in perturbation theory 
might improve the fit. Moreover, we have neglected terms of order N/cf^ with > 2, which 
might be non negligible when g"^ becomes large and N is not too small. 

• information estimates from real data are often biased because of poor sampling. Several 
procedures have been proposed to correct the bias [^], but the improvement given by the 
correction is not precisely quantifiable, and sampling biases cannot be discarded for good. 

• both with the tg model and in the gaussian approximation, we have assumed that neurons fire 
independently of one another to each movement, but the analysis of extracellular recordings 
has shown that correlations may play a non-negligible role in the coding |11, O]. 



Finally, we have examined the effect of averaging the distribution across the discrete correlates, 
evaluating the mutual information with respect to the continuously varying dimension alone. As 
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expected, averaging the distribution across s results in an information loss, which is more serious 
the larger the number p of discrete correlates. 

Further developments of this work include: the introduction of weak correlations in the signal 
of different neurons and the analysis of the information transfer to other stages of processing, given 
this as the coding scheme in the input layer of the network. In the specific case of movement coding 
these research directions might help model how information is transmitted down the motor system, 
in the planning and execution of motor tasks. 



A Detailed evaluation of the second order coefficient 



We show here how to evaluate the coefficient of the quadratic term in eq.(12), that we write again: 




(52) 

This quantity has to be integrated over continuous and discrete replicas. First we perform 
the average across the quenched variable t^q. Expanding the products it is easy to see that the 
quantities to be averaged are all like (??(t?fc))^o and {fi'^{'&k))^o, that we have already calculated in 
eq.(p^,(pO|). Another average that we need is {fi{'&k)rj{'&i))^o, with k ^ I: 
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cos{(m - z^) {'dk - i^i)} 
(53) 



Thus only terms like {il{'&k)v{'^i))^o still depend on continuous replicas. 

Since terms like cos(m — v){'dk — i^i), with k ^ I are zero when integrated on 'dk^'&i the only 
term which requires a careful evaluation in performing the integration on continuous replicas is the 
product {fi{'&k)v{^i))^o {v{^e)v{'&,y))^o- 

After integration on continuous replicas this term yields 
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Rearranging all terms one has, finally 
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{^ke^tJLl + ^kti^gl) 



(54) 



where Ai and A2 are defined in eq.(|T9|) and (20). To correctly perform the summation across the 
discrete replicas and to keep only terms of order n in the limit n ^ 0, we consider separately the 
different contributions to the sum: 
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E E ^ E + E i^ok + Sei)+ E i^f^k + S^.l)+ E ('^Mfe'^^pi + '^£>fc'^MO • (55) 

Since in eq.(|5^) terms like ((es^. — Esi)^)^ are zero if Sk = sz, we have to distinguish between 
different cases for each term in eq.(|55|) in performing the summation on discrete rephcas: 



E-E^E-E E -E+E-E E -E 

+E-E E - E+E-E E -E; (56) 

Sl SkJ^SiSg=Sm Sn + 1 Si Sk=SlSg = Sm S„ + l 

With a bit of combinatorics and keeping only the terms order n, which give a finite contribution 
in the limit n — > 0, we obtain the final result for the second order term 
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where the expressions of Ai,A2 are given in eqs. (|l5D , (|lq). Considering the result obtained at the 
first order, eq. (|T8[) , it is easy to derive the expression for the mutual information up to the second 
order in N/a"^: 
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